Skip to content

feat: support manual cleanup sandboxes#446

Open
hittyt wants to merge 10 commits intoalibaba:mainfrom
hittyt:feat/non-expire-sandbox
Open

feat: support manual cleanup sandboxes#446
hittyt wants to merge 10 commits intoalibaba:mainfrom
hittyt:feat/non-expire-sandbox

Conversation

@hittyt
Copy link
Collaborator

@hittyt hittyt commented Mar 14, 2026

Summary

  • Add manual cleanup sandbox lifecycle support by treating omitted or null timeout as non-expiring sandbox creation.
  • Update Docker lifecycle handling, timeout limit validation, SDKs, tests, and docs to support nullable expiresAt and explicit cleanup semantics.
  • Document the controlled rollout path: upgrade SDKs first, then upgrade the server, then start creating manual-cleanup sandboxes.

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Impact:

  • Lifecycle API responses may now return expiresAt = null for manual-cleanup sandboxes.
  • Older SDKs that assume expiresAt is always a timestamp may fail when they create, get, or list manual-cleanup sandboxes.

Migration path:

  • Upgrade SDKs/clients first.
  • Upgrade the server second.
  • Only then start creating sandboxes with timeout omitted or null.
  • Do not introduce manual-cleanup sandboxes into shared environments while older lifecycle API readers are still active.

Checklist

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d45c49734f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hittyt
Copy link
Collaborator Author

hittyt commented Mar 14, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e35ecba27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hittyt
Copy link
Collaborator Author

hittyt commented Mar 14, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfeef4cca0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hittyt hittyt force-pushed the feat/non-expire-sandbox branch from cfeef4c to 639e9cb Compare March 15, 2026 00:57
@hittyt
Copy link
Collaborator Author

hittyt commented Mar 15, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 639e9cb934

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hittyt hittyt requested a review from Pangjiping March 15, 2026 07:03
Pangjiping
Pangjiping previously approved these changes Mar 15, 2026
Copy link
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll be working with @Generalwin to get the kubernetes_service work done.

@ctlaltlaltc
Copy link
Contributor

Thanks the great feature for long-running workloads.

hittyt added 3 commits March 16, 2026 15:40
Move _parse_and_accumulate_mount_refs() before the expires/manual-cleanup
branch so that OSSFS ref counts are rebuilt for all sandbox containers on
server restart, not just those with expiration timers.

Made-with: Cursor
@hittyt
Copy link
Collaborator Author

hittyt commented Mar 16, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1aa1b164a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

hittyt added 2 commits March 16, 2026 16:42
TOML has no null literal; advise operators to comment out the key
instead of setting it to null.

Made-with: Cursor
ge=60,
description=(
"Maximum allowed sandbox TTL in seconds for requests that specify timeout. "
"Omit from config to disable the server-side upper bound."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says "Omit from config to disable the server-side upper bound", but this field has default 86400, so omitting it still enables the cap. Could we align behavior and docs here (either default None, or update wording to reflect default cap)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

log_level = "INFO"
# api_key = "your-secret-api-key" # Optional: Uncomment to enable API key authentication
# eip = "1.2.3.4" # Optional: External IP/hostname for endpoint URLs when returning sandbox endpoints
# Maximum TTL for sandboxes that specify timeout. Comment out this line to disable the upper bound.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says commenting out this line disables the upper bound, but with ServerConfig.max_sandbox_timeout_seconds defaulting to 86400, commenting it out does not disable the cap in practice. Please align guidance with runtime behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Sandbox timeout in seconds. The sandbox will automatically terminate after this duration.
SDK clients should provide a default value (e.g., 3600 seconds / 1 hour).
The maximum is controlled by the server configuration (`server.max_sandbox_timeout_seconds`).
Omit or set null to disable automatic expiration and require explicit cleanup.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout: null is documented here as manual-cleanup mode, but actual support is runtime/provider dependent (Kubernetes may reject when provider capability is false). Could we make this constraint explicit in the schema/docs to avoid client expectation mismatch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added necessary description.

expires_at = None
if request.timeout is not None:
expires_at = calculate_expiration_or_raise(created_at, request.timeout)
elif not self.workload_provider.supports_manual_cleanup():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation here correctly gates manual cleanup by provider capability. To keep API contract clear, please ensure lifecycle spec/docs explicitly state this is runtime/provider-dependent behavior (not universally available across all runtimes).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are aiming to align this behavior for all runtime providers, however, k8s runtime implementation is not supported via this PR to avoid too large code change.

K8s runtime implementation should be provided via another PR.

hittyt added 3 commits March 16, 2026 18:04
Align default with documentation: omitting the key now truly disables
the server-side upper bound instead of silently applying 86400.

Made-with: Cursor
Kubernetes providers may reject null timeout when the workload provider
does not support non-expiring sandboxes. Make this constraint explicit
in the OpenAPI spec and schema description.

Made-with: Cursor
Self-hosted runner's persisted build cache can serve stale class files
from a previous branch, causing compilation errors when new methods
(e.g. manualCleanup) are not found. Clean and disable build cache
to ensure a fresh build every run.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants